Projecting scRNA-seq data onto a reference map
R environment
First, check package dependencies and install ProjecTILs
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true")
options(timeout = max(300, getOption("timeout")))
if (!requireNamespace("renv")) install.packages("renv")
library(renv)
renv::restore()* The library is already synchronized with the lockfile.
remotes::install_github("carmonalab/scGate")
remotes::install_github("carmonalab/ProjecTILs")
library(ProjecTILs)
library(Seurat)Load reference atlas and query data
First, load the default reference TIL atlas. If no reference map file
is provided, the function load.reference.map() will
automatically download it from https://doi.org/10.6084/m9.figshare.12478571
ref <- load.reference.map()[1] "Loading Default Reference Atlas..."
[1] "/Users/mass/Documents/Projects/Cell_clustering/ProjecTILs.demo/ref_TILAtlas_mouse_v1.rds"
[1] "Loaded Reference map ref_TILAtlas_mouse_v1"
Let’s explore the reference atlas
refCols <- c("#edbe2a", "#A58AFF", "#53B400", "#F8766D", "#00B6EB", "#d1cfcc", "#FF0000",
"#87f6a5", "#e812dd")
DimPlot(ref, label = T, cols = refCols)See expression of important marker genes across reference subtypes
markers <- c("Cd4", "Cd8a", "Ccr7", "Tcf7", "Pdcd1", "Havcr2", "Tox", "Izumo1r",
"Cxcr6", "Xcl1", "Gzmb", "Gzmk", "Ifng", "Foxp3")
VlnPlot(ref, features = markers, stack = T, flip = T, assay = "RNA")Now let’s load a query dataset - Miller et al., Nature Immunol (2019)
# A sample data set is provided with the ProjecTILs package
querydata <- ProjecTILs::query_example_seuratMore generally, it is possible to load a query matrix with gene names and barcodes (e.g. 10X format or raw counts)
## Raw count matrix from GEO
library(GEOquery)
geo_acc <- "GSE86028"
getGEOSuppFiles(geo_acc)
fname2 <- sprintf("%s/GSE86028_TILs_sc_wt_mtko.tpm.log2.txt.gz", geo_acc)
querydata2 <- read.sc.query(fname2, type = "raw.log2")Run Projection algorithm
query.projected <- Run.ProjecTILs(querydata, ref = ref)[1] "Using assay RNA for query"
[1] "37 out of 1501 ( 2% ) non-pure cells removed. Use filter.cells=FALSE to avoid pre-filtering"
[1] "Aligning query to reference map for batch-correction..."
Projecting corrected query onto Reference PCA space
Projecting corrected query onto Reference UMAP space
NB: by default, Run.ProjecTILs() will
pre-filter T cells using scGate. In case the
input dataset is already pre-filtered, you can disable this step using
Run.ProjecTILs(querydata, ref=ref, filter.cells = FALSE).
If you are using a custom reference map that is not composed of T cells,
you may specify a different scGate filter using the
scGate_model parameter.
Visualize projection
Plot projection of new data over the reference in UMAP space. The contour lines display the density of projected query cells onto the reference map.
plot.projection(ref, query.projected, linesize = 0.5, pointsize = 0.5)Plot the predicted composition of the query in terms of reference T cell subtypes
plot.statepred.composition(ref, query.projected, metric = "Percent")How do the gene expression levels compare between reference and query for the different cell states?
plot.states.radar(ref, query = query.projected)Compare cell states across conditions
If we have multiple conditions (e.g. control vs. treatment, or samples from different tissues), we can search for discriminant genes between conditions (otherwise, by default this analysis is performed using the reference as the ‘control’)
# Simulate a condition which e.g. increases Gzmb expression compared to control
query.control <- subset(query.projected, subset = Gzmb < 1.5)
query.perturb <- subset(query.projected, subset = Gzmb >= 1.5)
plot.states.radar(ref, query = list(Control = query.control, Query = query.perturb))In this toy example, where we simulated a condition that increases Gzmb expression compared to control, we expect cytotoxicity genes to drive differences.
discriminantGenes <- find.discriminant.genes(ref = ref, query = query.perturb, query.control = query.control,
state = "CD8_Tex")
head(discriminantGenes, n = 10) p_val avg_log2FC pct.1 pct.2 p_val_adj
Gzmb 1.161287e-109 2.9169382 1.000 0.803 3.175075e-105
Gzmc 1.874772e-41 2.9937927 0.796 0.368 5.125813e-37
Lgals1 1.781673e-30 0.4869029 1.000 1.000 4.871272e-26
AA467197 1.991773e-26 1.2650526 0.831 0.503 5.445707e-22
Mt1 5.427822e-23 1.2466189 0.764 0.456 1.484021e-18
Ccl9 6.506134e-20 1.6393067 0.481 0.124 1.778842e-15
Serpinb6b 3.348957e-17 0.8031924 0.847 0.601 9.156382e-13
Ccl3 5.352997e-17 1.3944983 0.897 0.736 1.463563e-12
Prf1 5.455189e-17 0.6208702 0.492 0.161 1.491503e-12
Gzme 7.032792e-17 2.1247902 0.452 0.145 1.922836e-12
We can use a volcano plot to display differentially expressed genes:
library(EnhancedVolcano)
EnhancedVolcano(discriminantGenes, lab = rownames(discriminantGenes), x = "avg_log2FC",
y = "p_val", pCutoff = 1e-09, FCcutoff = 0.5, labSize = 5, legendPosition = "none",
drawConnectors = F, title = "Gzmb_high vs. Gzmb_low (Tex)")Using a random subsetting, p-values should not be significant:
rand.list <- ProjecTILs:::randomSplit(query.projected, n = 2, seed = 1)
discriminantGenes <- find.discriminant.genes(ref = ref, query = rand.list[[1]], query.control = rand.list[[2]],
state = "CD8_Tex")
EnhancedVolcano(discriminantGenes, lab = rownames(discriminantGenes), x = "avg_log2FC",
y = "p_val", pCutoff = 1e-09, FCcutoff = 0.5, labSize = 5, legendPosition = "none",
drawConnectors = F, title = "Random split (Tex)")Using ProjecTILs as a classifier
If you do not wish to embed your query data into the reference space, you may also simply use ProjecTILs as a cell type classifier. This may be useful to annotate cell types in your query without altering existing embeddings.
See the query dataset in unsupervised low-dim embeddings:
querydata <- querydata |>
FindVariableFeatures(nfeatures = 500) |>
ScaleData() |>
RunPCA(npcs = 10) |>
RunUMAP(dims = 1:10)
DimPlot(querydata)The ProjecTILs.classifier function applies
reference-projection but does not alter the current embeddings.
querydata <- ProjecTILs.classifier(query = querydata, ref = ref)[1] "Using assay RNA for query"
[1] "37 out of 1501 ( 2% ) non-pure cells removed. Use filter.cells=FALSE to avoid pre-filtering"
[1] "Aligning query to reference map for batch-correction..."
Projecting corrected query onto Reference PCA space
Projecting corrected query onto Reference UMAP space
DimPlot(querydata, group.by = "functional.cluster")We can confirm that most of the cells were classified as CD8_Tex. Please note that filtered cells (i.e. those that were removed by the scGate filter) are assigned the NA label, as they cannot be confidently projected into the reference.
Further reading
For applications of ProjecTILs to gain biological insight on several public datasets please see the ProjecTILs case studies - INDEX - Repository
To generate your own reference map for ProjecTILs see Custom reference map tutorial
Publication: Andreatta et al Nat. Comm. 2021
ProjecTILs repository